Speech-Gesture Driven Multimodal Interfaces for Crisis Management
نویسندگان
چکیده
Emergency response requires strategic assessment of risks, decisions, and communications that are timecritical while requiring teams of individuals to have fast access to large volumes of complex information and technologies that enables tightly coordinated work. The access to this information by crisis management (CM) teams in emergency operations centers can be facilitated through various humancomputer interfaces. Unfortunately these interfaces are hard to use, require extensive training, and often impede rather than support teamwork. Dialogue-enabled devices, based on natural, multimodal interfaces have the potential of making a variety of information technology tools accessible during crisis management. This paper establishes the importance of multimodal interfaces in various aspects of crisis management and explores many issues in realizing successful speech-gesture driven, dialog-enabled interfaces for CM. The paper is organized in five parts. The first part discusses the needs of CM that can be potentially met by the development of appropriate interfaces. The second part discusses the issues related to the design and development of multimodal interfaces in the context of CM. The third part discusses the state of the art in both the theories and practices involving these human-computer interfaces. In particular it describes, the evolution and implementation details of two representative systems, called crisis management (XISM) and Dialog Assisted Visual Environment for Geoinformation (DAVE_G). The fourth part speculates on the short term and long term research directions that will help addressing the outstanding challenges in interfaces that support dialog and collaboration. Finally, part five concludes the paper. Correspondence Information: Dr. Rajeev Sharma Department of Computer Science and Engineering 220 Pond Laboratory, Pennsylvania State University University Part, PA 16802 Email: [email protected] Phone: (814) 867-8977 Fax: (814) 867-8957 Submitted to the Proceedings of IEEE special issue on Multimodal Human-Computer Interface Speech-Gesture Driven Multimodal Interfaces in Crisis Management Submitted to the Proceedings of IEEE special issue on Multimodal Human-Computer Interface 2 Part I: Need for Multimodal Interfaces in Crisis Management The need to develop information science and technology to support crisis management has never been more apparent. The world is increasingly vulnerable to sudden hazardous events such as terrorist attacks, chemical spills, hurricanes, tornadoes, floods, wildfires, and disease epidemics. Emergency response requires strategic assessment of risks, decisions, and communications that are time-critical while requiring teams of individuals to have fast access to large volumes of complex information and technologies that enables tightly coordinated work. The access to this information by crisis management teams in emergency operations centers is through various human-computer interfaces that unfortunately are hard to use, require extensive training, and often impede rather than support teamwork. Meeting the challenges of crisis management in a rapidly changing world will require more research on fundamental information science and technology. To have an impact, that research must be linked directly with development, implementation, and assessment of new technologies. Making information technology easier to use for crisis managers and related decision makers is expected to increase the efficiency of coordination and control in strategic assessment and crisis response activities. To be useful and usable, the interface technologies must be human-centered, designed with input from practicing crisis management personnel at all stages of development. Crisis management scenarios (please see Figure 1 for an example scenario) considered in this paper include both strategic assessment (work to prepare for and possibly prevent potential crises) and emergency response (activities designed to minimize loss of life and property). Most crisis management relies upon geospatial information (derived from location-based data) about the event itself, its causes, the people and infrastructure affected, the resources available to respond, and more. Geospatial information is essential for pre-event assessment of risk and vulnerability as well as to response during events and subsequent recovery efforts. Crisis management also relies upon teams of people who need to collaboratively derive information from geospatial data and to coordinate their subsequent activities. Current geospatial information technologies, however, have not been designed to support group work and we have very little scientific understanding of how groups (or groups of groups) work in crisis management using geospatial information and the technologies for collecting, processing, and using it. We believe that dialogue-enabled devices, based on natural, multimodal interfaces have the potential of making a variety of information technology tools accessible during crisis management. Multimodal interfaces allow users to interact via a combination of modalities, for instance, speech, gesture, pen, touch screen, displays, keypads, pointing devices, and tactile sensors. They offer the potential for considerable flexibility, broad utility, and use by a larger and more diverse population than ever before. A particularly advantageous feature of multimodal interface design is its ability to support superior error handling, compared to unimodal recognition-based interfaces, in terms of both error avoidance and graceful recovery from errors [1-4]. But the traditional human-computer interfaces do not support the collaborative decision making involved in crisis management. The ability to develop a multimodal interface system depends on knowledge of the natural integration patterns that typify people's combined use of different input modes. Developing a multimodal interface for collaborative decision-making requires systematic attention to both human and computational issues at all stages of the research. The human issues range from analysis of the ways in which humans indicate elements of a geographic problem domain (through speech and gesture) to the social aspects of group work. The computational issues range from developing robust real-time algorithms for tracking multiple people, recognizing continuous gestures and understanding spoken words, through developing methods for syntactical and semantic analysis of speech/gesture commands, for designing an efficient dialog-based natural interface in the geospatial domain for crisis management. Given the complex nature of users' multimodal interaction a multidisciplinary approach is required to design a multimodal system that integrates complementary modalities to yield a highly synergistic blend. The main idea is to consider each of the input modalities in terms of the others, rather than separately. Speech-Gesture Driven Multimodal Interfaces in Crisis Management Submitted to the Proceedings of IEEE special issue on Multimodal Human-Computer Interface 3 The key to success is the integration and synchronization requirements for combining different modes strategically into a whole system. A well-designed multimodal architecture can support mutual disambiguation of input signals [5]. Mutual disambiguation involves recovery from unimodal recognition errors within a multimodal architecture because semantic information from each input mode supplies partial disambiguation of the other mode, thereby leading to more stable and robust overall system performance. This integration is useful, both in the disambiguation of the human input to the system and in the disambiguation of the system output. This paper discusses the evolution and implementation of a dialog-based speech-gesture driven multimodal interface systems developed by group of researchers at the Pennsylvania State University and Advanced Interface Technologies (AIT). The main goal was to design HCI systems that will allow, a team of individuals to collaborate while easily and naturally interacting with complex geospatial information. The unified multimodal framework would include two or more people in front of a large display, agents in the field with small displays as well as mobile robotic agents. Such a multimodal, crossplatform collaborative framework could be an important element for rapid and effective response to a wide range of crisis management activities, including homeland security emergencies. The objectives of this paper are: i. To outline how cutting-edge information technologies, for example a speech-gesture driven multimodal interface, allow individuals as well as teams to access essential information more quickly and naturally, thus improving decision making in crisis situations. ii. To discuss the challenges faced in designing such a system (which may include): to identify and respond to the critical needs of crisis mitigation and response, to provide the crisis management team a distributed environment for training, testing including virtual space for distant members to collaborate in making the decision, iii. To discuss the state of the art of speech-gesture driven collaborative systems and technological issues involved in the design of speech gesture based interfaces. This includes speech and image analysis tasks for sensing, multimodal fusion framework for user action recognition, and dialog design and semantics issues in the domain of crisis management. iv. To report our progress to date, detailing the evolution of two implemented systems, namely, XISM and DAVE_G. v. To discuss the future challenges that must be overcome to realize natural and intuitive interfaces, for collaborative decision making in the context of crisis management. A Crisis Management Scenario: Let us consider an example scenario that could help in grounding the discussions on the role of multimodal interfaces for collaborative work for crisis management (see Figure 2 for a conceptual snapshot of the problem). Imagine the crisis management center of a government organization with, Center Director Jane Smith and Paul Brown, chief logistic and evacuation manager, in front of a largescreen display linked to the organization’s emergency management system called, Multimodal Interface for Collaborative Emergency Response (MICER). “An earthquake of magnitude 7.1 has hit San Diego and many freeways and major roads are impassable. Buildings are severely damaged or collapsed and fire has broken out in many places. Shortly before the quake, seismographs indicated a fault shift and triggered alarms at emergency centers and local governments. A few minutes later, emergency operation centers are occupied and prepared to respond to
منابع مشابه
Project highlight: GeoCollaborative crisis management
The major natural disasters that occurred in the last few months have shown the importance and necessity for collaborative, international crisis management. Current geoinformation technologies are potentially powerful tools for mitigation, preparation, response, and recovery tasks in crisis situations; however, they fail to support group work and have typically been designed without scientific ...
متن کاملMultimodal Human-Computer Interaction for Crisis Management Systems
* Corresponding author. Abstract This paper presents a multimodal crisis management system (XISM). It employs processing of natural gesture and speech commands elicited by a user to efficiently manage complex dynamic emergency scenarios on a large display. The developed prototype system demonstrates the means of incorporating unconstrained free-hand gestures and speech in a real-time interactiv...
متن کاملHuman-GIS Interaction Issues in Crisis Response
Geospatial information systems (GIS) provide a central infrastructure for computer supported crisis management in terms of database, analytical models and visualization tools, but the user interfaces of such systems are still hard to use, and do not address the special needs of crisis managers who often work in teams and make judgments and decisions under stress. This paper articulates the over...
متن کاملUnification-based Multimodal Integration
Recent empirical research has shown conclusive advantages of multimodal interaction over speech-only interaction for mapbased tasks. This paper describes a multimodal language processing architecture which supports interfaces allowing simultaneous input from speech and gesture recognition. Integration of spoken and gestural input is driven by uni cation of typed feature structures representing ...
متن کاملA Salience-Based Approach to Gesture-Speech Alignment
One of the first steps towards understanding natural multimodal language is aligning gesture and speech, so that the appropriate gestures ground referential pronouns in the speech. This paper presents a novel technique for gesture-speech alignment, inspired by saliencebased approaches to anaphoric pronoun resolution. We use a hybrid between data-driven and knowledge-based mtehods: the basic str...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001